MMmalloc Alpha Release V.23 Gianni Mariani 7-Apr-1994 Intro ----- This library, pronounced "M" "M" malloc, is a plug in replacement of the "libmalloc" library ( see malloc(3X) ). In general it is faster, more memory efficient and has better functionality than the original libmalloc. A major feature is to be able to recycle memory so as to reduce swapping and swap requirements. Compared to the recent version of libc's malloc it is faster with comparable memory usage and aging characteristics. It is important to note that all comments on other libraries or features in this document will refer to IRIX 5.2 and not to earlier IRIX releases. Feature Summary --------------- MMmalloc feature set brief. . Plug replacement for /lib/libmalloc.so. ( see malloc(3X) ) . Faster than libmalloc and libc's malloc. . More memory efficient than libmalloc. . Memory recycle feature. . Highly tunable yet easy to use defaults. . Tuned realloc performance. . Malloc trash debugging support. . Better shared memory arena support. . Better memalign support. . Tunable alignment of addresses returned by malloc. Performace of any malloc library is highly dependent on the pattern of memory usage by the application. Relative performance of various malloc's can vary wildly even on different data sets in the same application so performance quotes are highly subjective. However most malloc's have none or limited ways of tuning while MMmalloc is very flexible. Where is MMmalloc suited ? -------------------------- MMmalloc suits larger long lived applications with large cyclic memory usage. Example are the X Server, image processing applications, large database servers and larger Motif applications. Where is MMmalloc NOT suited ? ------------------------------ Short lived applications with small dynamic memory usage. In these cases you should stick it out with libc since it has excellent malloc characteristics for this sort of application. Files in this Directory ----------------------- README : this->README MMinstall : csh script to install MMmalloc onto an IRIX 5.2 system MMfire : sh script to make all armed processes call recycle MMdbx : dbx script to dump information about an MMmalloc error libmalloc.so : the MMmalloc libmalloc plug compatible dso libamalloc.so : the MMmalloc amalloc library ( see amalloc(3P) ) MMmalloc.h : additional MMmalloc defines to compliment <malloc.h> How to use MMmalloc ------------------- MMmalloc can be installed as the default libmalloc library for a system. This will effect all applications linked with libmalloc. To install MMmalloc you can use MMinstall in the same directory you found this README file. Simply invoke MMinstall as root and any applications linked with libmalloc (-lmalloc) will now use MMmalloc. MMinstall will place the original libmalloc in /lib/libmalloc.so.orig and the MMmalloc library will be /lib/libmalloc.so . If you are less inclined to adventure you can also set the environment variable LD_LIBRARY_PATH to the directory that contains the MMmalloc library however you will need to create /var/malloc/ArmMMmalloc yourself if you wish to try MMfire. See rld(1) for more detail. You may also use MMmalloc with ELF binaries not linked with -lmalloc by setting the environment variable _RLD_LIST as below. _RLD_LIST=libmalloc.so:DEFAULT export _RLD_LIST A small shell script named MMfire will send a SIGRTMIN signal to all processes that are currently are armed to recycle(). The contents of the file /tmp/MMmalloc.log will contains a recycle report if it is enabled. The signal used may be changed in the future. Do not create a production application that depends on MMmalloc using SIGRTMIN for recycle as it will likely fail in future releases of MMmalloc. MMfire will kill some applications that reset the SIGRTMIN signal (e.g. 4Dwm or Xsgi). MMfire usage : MMfire : fire recycle and echo lines added to /tmp/MMmalloc.log MMfire -s : silent operation MMfire -v : List process armed and fire recycle. MMfire -u : Don't fire recycle but list processes currently armed. MMdbx contains a simple set of dbx commands do print malloc_check information if it is available. If values printed are (nil) then malloc_check did not detect a corruption. See more information on malloc_check and MALLOC_DEBUG below. Description ----------- The algorithm used in MMmalloc is similar on the surface to libmalloc. The libmalloc algorithm uses two types of allocators. One is a "quick" allocator for small allocations of equal size and the other a more traditional free list search allocator. There are a set of quick allocators, each dedicated to requests of a single size ( sizes are rounded up to some power of 2 ) each quick allocator will allocate a large block which it will use to satisfy multiple smaller requests. The normal allocator in libmalloc is a non-exhaustive linear free list search allocator. MMmalloc has a similar two staged approach with totally new allocation and deallocation algorithms. MMmalloc's "quick" allocator is adaptive. Most applications will make many malloc requests for a small set of sizes. The "quick" allocator will only be activated after a threshold number of requests for a particular size have been issued. This saves memory from avoiding the allocation of mostly empty large quick allocator blocks for rarely used sizes. Another adaptiveness is that the quick block size will grow as the number of requests for a size increases to minimize the amount of time allocating and freeing quick allocator blocks. One major advantage over libmalloc is that MMmalloc quick allocator blocks are free'd to the normal allocator once all quick allocations in a quick block are free'd while libmalloc will always retain a quick allocator block for a quick allocator of a particular size. This flexibility makes MMmalloc quick allocators suited for larger sizes with less waste than libmalloc. Quick allocators also have a 4 byte header overhead for each allocation so they may use less memory than the normal allocator which has an 8 byte overhead. ( 64 bit systems have 8/16 byte overhead) Improvements in the normal allocator come from the manipulation of multiple free lists. libmalloc basically has three free lists. One list for free blocks too small and hence ignored by malloc but coalescable by free, one for the most recently sbrk'd block and the main long double linked list of free blocks. MMmalloc can utilize hundreds of free lists in a three tiered linear sized bucket configuration. Indeed the algorithm becomes an exercise in efficient management of free lists. For example, the default configuration of MMmalloc has 130 free lists, the first 120 lists cover sizes of multiples of 32, the next 7 lists are multiples of 4096 and the last 3 are multiples of 262144 with the last list taking all the rest. Each free list behaves in a similar fashion to the main libmalloc free list. An allocation attempt will check a number of the first elements on the free list but not exhaustive, in this regard it is the same as libmalloc, however, libmalloc will request a new block from the system at this point while MMmalloc will grab the first block from the next largest referenced non-empty free list and partition it as required. The trick to performance is keeping the free lists consistent is to NOT keep them optimal all the time. Free lists reference the next largest and usually non-empty "pivotal" free lists to avoid searching all the lists all the time. The pivot lists are partially rehashed ( recomputed ) once a number of allocations have been made on a "dirty" pivot. This means that the free block found is not always optimal but "most times highly" optimal. MMmalloc realloc performance for small sizes is optimized since it uses it's own memcpy routine that assumes double word and non-overlapped data. Since it is difficult to extend sizes of quick blocks, MMmalloc realloc will promote 'quick' allocations to normal allocations based on some threshold destination size constant. One major difference between most other mallocs and MMmalloc is method of coalescing blocks being freed. At free() time, most other mallocs will coalesce adjacent free'd blocks ( forward as well as backward ) however, MMmalloc will take short cuts and only coalesce free'd blocks directly after the one being free'd. This reduces the amount of effort spent coalescing and partitioning blocks. Depending on some threshold conditions, MMmalloc will coalesce selected free blocks when it has exhausted searches. This helps the multiple free list algorithm so as to distribute free blocks to the many free lists hence reducing the time rehashing dirty pivots as well as partitioning large free blocks. As mentioned earlier, MMmalloc allows the application to unmap unused pages. This is performed by the introduction of the "recycle()" entry point. Recycle coalesces free blocks and performs garbage collection and recycling ( return to the operating system ). This creates yet another level to the allocation hierarchy being the unmapped allocator which has similar properties to the main MMmalloc allocator but dedicated to unmapped or `untouchable' pages. There is also a re-entrant version of recycle named rerecycle() that is safe to be called by a signal handler. On return from rerecycle it is not guaranteed that recycling has actually been performed but that recycling will be performed if it safe to do so or by the process currently inside MMmalloc's routines before it returns. rerecycle also checks to see if it is worthwhile to call recycle by checking the amount of memory returned by free() since the last time it was called to avoid an unnecessary walk of the arena. MMmalloc's memalign is persistent. A block returned by memalign will retain alignment even after a realloc if it's new size permits it to be. MMmalloc also has a sibling library to emulate the amalloc(3P) interface. libamalloc.so uses MMmalloc's libmalloc.so for it's main allocation routines instead of providing a completely different set. Also available is the ability to add memory to your arena with the aaddmem function. This is useful for extending the size of an arena when you have unrelated processes sharing an arena rather than using the "grow" callback function which may not be in the same address space for every process. Additional Features ------------------- MMmalloc is sensitive to the environment variable MALLOC_DEBUG. Setting MALLOC_DEBUG to 1 will force MMmalloc to check the arena for consistentcy on every call to the library. Setting this to 1 makes calls the MMmalloc very slow since the entire arena is checked for consistentcy on every call. This is the same as calling mallopt(M_DEBUG,1). Successfull return when debug mode is on will guarantee that the malloc data structures are consistent. malloc_check() can also be called at any time by the application to validate arena consistentcy. This version of MMmalloc also has a built in signal handler attached to SIGRTMIN that is armed if the file /var/malloc/ArmMMmalloc exists and is readable and the environment variable MALLOC_NOARM is not set. The file /var/malloc/ArmMMmalloc is kept open so that it can easily be identified which processes are currently armed by fuser(1M). This is intended for use by an external swap monitor daemon that will send SIGRTMIN signals to the armed processes. On receipt of a SIGRTMIN, the application will be forced to call recycle and a report of memory recycled will be placed in /tmp/MMmalloc.log if enabled. The logging is enabled by default but can be disabled by the second character in the MALLOC_DEBUG variable. To disable malloc debugging and recycle logs use MALLOC_DEBUG="00". To enable recycle logs only use MALLOC_DEBUG="01". To enable both use MALLOC_DEBUG="11". The MALLOC_CONFIG environment variable allows the selection of a configuration of constants that MMmalloc utilizes in it's algorithms. MALLOC_CONFIG can be set between 1 and 4 with the following meanings. 1 - default - fast and memory efficient 2 - faster and memory hog 3 - slower and more memory efficient 4 - tuned for libil and fast big memory hog with good aging The MALLOC_CONFIG default can be set by the application by setting a global variable int _mm_defaultoptions = N; Where N is 1-4 as above. MALLOC_CONFIG can also be used to tweak the configuration being used. The intention is to allow the programmer to tweak the configuration during an applications optimization. It is not intended as an ugly interface to the malloc library and should not be used in production code as it will likely change in the future. The string in MALLOC_CONFIG can be a colon ( : ) separated list of configuration parameters. The parameter name is followed by an equal ( = ) and a numeric string that will be used as it's parameter. No white space is tolerated. The parameters are not totally independent and bad values will cause MMmalloc to fail. The following example will set the mm_ntopromo parameter to 300 and mm_minslots to 256 and use the default values from config 1. Numbers with a leading '0' are assumed to be base 8 and '0x' are base 16. MALLOC_CONFIG="1:mm_ntopromo=300:mm_minslots=0x100" export MALLOC_CONFIG <run application> Care should be taken when choosing new configuration values as they are interdependent. The following are the MALLOC_CONFIG parameters. Values in ()'s are the settings for the default configurations 1-4 respectively. mm_asizshft (3 3 3 4) This is the size requested granularity adjustment parameter. All malloc requests will be rounded upwards to the nearest multiple of 2^mm_asizshft. It is bad to have mm_asizshft less than 3. e.g. mm_asizshft=4 will round all requests to the nearest multiple of 16. If mm_nszmult=16 it will guarantee that all allocations will be returned on a 16 byte boundary. Quick allocator slots will also be based on multiples of this size. mm_asizshft should not be set to less than 2 on 32 bit systems and 3 on 64 bit systems. mm_asizshft also determines the alignment of pointers returned by malloc and realloc. mm_ntopromo (20 10 0 0) This is the number of requests required for a quick allocator before actually activating the quick allocator for that size. mm_nqalist (5 34 7 128) This is the number of quick allocators that will be used. This determines the size of the largest request that can be destined for the quick allocators. max quick allocator size = mm_nqalist * ( 1 << mm_asizshft ) - overhead overhead in this case is the size of a pointer which is 4 bytes on a 32 bit system and 8 bytes on a 64 bit system. mm_maxslots (120 120 30 45) Quick allocator block sizes grow as a proportion of the number of requests for a quick allocator. This will be pegged to mm_maxslots to avoid creating quick allocator blocks that are too large. The larger the block the less likely it all quick allocations in it will be free'd and hence it becomes locked into a quick allocator instead of potentially returning it to the main pool. mm_minslots (10 5 20 2) This is the minimum number of slots allowed for a quick allocator block. If there are too few allocations then the computed number of slots may be too small so this will limit the minimum value. mm_adjfactor (12 16 24 6) This if the quick block size adjustment factor. This is used to scale the number of `extensions' for a slot to the number to allocate for a new extension. Extension is allocation of a new large block. mm_adjfactor is normalized at 16. An mm_adjfactor value of 16 is a scale of 1.0. e.g. setting the mm_adjfactor to 8 ( making it a scale of 0.5 ) will evaluate the number of slots in a new extension to be 0.5 * the number of slots extended so far for a quick allocator. mm_qblkmult (64 256 8 4096) On top of all the minimum and maximum settings, the size of the quick allocator blocks are rounded up to the nearest multiple of mm_qblkmult. mm_qblkmult must be a power of 2 otherwise MMmalloc will fail. mm_nszmult (8 16 8 2048) This is the granularity adjustment for the normal allocator. All allocations in the normal allocator will be rounded to this value. This must be a power of 2. Setting this value to less than 4 will cause MMmalloc to fail. mm_xf[0].mm_flindx (120 90 110 63) mm_xf[0].mm_szshft ( 5 5 4 11) mm_xf[1].mm_flindx (127 105 125 94) mm_xf[1].mm_szshft ( 12 12 11 17) mm_xf[2].mm_flindx (129 109 129 110) mm_xf[2].mm_szshft ( 18 16 17 22) The above values determine the free list number selection. These are used to determine what free list index to search for an allocation request of a given size. There are three pairs of mm_flindx and mm_szshft parameters. Each set determines an allocation tier. The size requested is shifted by mm_szshft bits to the right ( division by 2^mm_szshft ) and the resultant value is compared to mm_flindx to determine if the size belongs on the next tier or the value is adequate for an index. On the last tier, sizes that are beyond are dropped into the last free list. It is important that the following relation is held true otherwise unpredictable results will occur : mm_xf[N].mm_flindx*(1 << mm_xf[N].mm_szshft) < (1 << mm_xf[N+1].mm_szshft) Setting parameters such that (mm_xf[0].mm_flindx << 1) is less than mm_nszmult will cause unused free lists to be created which is a waste. mm_zeromalloc (0 0 0 0) This determines the behavior of malloc( 0 ). mm_zeromalloc=0 will force malloc to return ( void * ) 0. mm_zeromalloc=1 will force malloc to return a pointer that can be used to call free(). Any dereferences to the pointer will have undefined behavior. mm_maxsrch (20 50 20 512) This determines the number of attempts to search a free list before giving up and using a larger free list. Larger values will persist in searching a free list and hence trade off speed with memory. mm_brkmult (8192 16384 16384 16384) When memory is requested from the system it will be requested in multiples of this size. It is important that this value be a power of 2. mm_minrmdr (32 32 16 16) This determines how free blocks are partitioned the smallest. If the remaining size in a free block is less than this size after partitioning it is not partitioned and memory is assumed allocated to the requested block. It is important that this value is not smaller than the size of 4 pointers. 16 bytes on a 32 bit system and 32 bytes on a 64 bit system. It can also be used to reduce micro-fragmentation of the arena. mm_gcthresh (81920 1048576 0 1048576) After an unsuccessful search of a free list, this value is used to determine is a coalesce operation should be performed on the free list just searhed. If more than mm_gcthresh bytes have been freed since the last time a coalesce was performed and more than mm_gcthresh are are currently unused in the arena then a coalesce will take place. mm_missthrash (6 10 4 2) This determines the number of allocations on a dirty pivot that will be satisfied before a rehash of the free list is computed. mm_flsearh (20 10 60 10) This determines the number of empty pivot free lists that will be tolerated before a rehash of the free lists. mm_reallthrsh (128 128 64 256) This is used by realloc to determine what sizes will be reallocated in the normal arena instead of the quick arena. mm_nunmaplists (16 8 8 8 ) This determines the number of unmapped free lists used. These will be used by recycle to keep unmapped regions from the main arena but will allocate from these regions before calling sbrk() again. mm_szunmapshift (14 12 12 12) This determines the sizes of the unmapped lists. This will be used in a similar way to mm_xf[0].mm_szshft but there is only one tier. mm_unmapalign (4096 4096 4096 4096) This must be equal to the pagesize of the system. This will be used to determine which boundaries are suitable to mmap/munmap. mm_minunmapsize (8192 4096 4096 4096) This determines the smallest region that will be unmapped. mm_minunmapsrch (0 0 0 0) This determines the smallest free list size that will be used for coalescing by recycle. mm_minfreerecy (0 0 0 0) This is used by rerecycle to determine how many bytes must be available before it will call recycle. mm_debug_on (0 0 0 0) mm_debug_on sets debug mode on if it is 1 and off if 0. Setting this to 1 is the same as setting MALLOC_DEBUG to 1. This will force a malloc_check to be performed on every entry to MMmalloc. mm_abort_badfree (0 0 0 0) This determines behavior of a free of an already free'd block. Setting mm_abort_badfree to 1 will cause MMmalloc to core dump on a free of an unidentifiable block. Setting this value to 2 will force mmalloc print messages regarding the error on standard error (stderr). mm_clronfree (0 0 0 0) mm_valclronfree (0 0 0 0) mm_clronfree can be set to 1 which will enable the clear on free option as in mallopt. mm_valclronfree will be used as the value to use. mm_reportrecy (0 0 0 0) Setting mm_reportrecy to 1 will disable rerecycle reporting to the /tmp/MMmalloc.log file. The MM_CONFIG structure can also be used to determine compile time parameters for the malloc configuration. New Functions in MMmalloc ------------------------- size_t recycle( void ) size_t arecycle( void * arena ) This will coalesce all free lists with sizes larger than a threshold (mm_minunmapsrch) and unmap pages that are completely free and larger than a threshold (mm_minunmapsize). Recycle will call _mm_munmap and _mm_mmapmem to map and unmap memory in the arena. These are provided as weak symbols and can be overridden by the application to perform a similar but perhaps different operation. The same functions will be called by arecycle. void rerecycle( void ) This is a signal handler safe version of recycle. It is not guaranteed that recycle will have been called on return from rerecycle but it will be called either by the signal handler or the process that currently has the malloc lock before it returns from the malloc library call. There is no arerecycle() entry point. rerecycle() will also write a summary line in /tmp/MMmalloc.log if it mm_reportrecy is set to 0 and mm_minfreerecy bytes have been free'd since the last call to recycle(). The following is an example of output from PID 27888, the process was able to recycle/unmap 139264 bytes from the address space. It total, 196608 bytes have been allocated by sbrk and 49240 bytes are still allocated to the application. Memory left is either free or overhead in MMmalloc. 27888 - recycled 139264 sbrk 196608 unmapped 139264 aloc 49240 void malloc_check( void ) void amalloc_check( void * arena ) This will validate the malloc arena data structures to make sure there is no corruption. On failure, malloc_check will NOT return to the caller but will force a SIGSEGV signal. If malloc_check was able to identify a problem it will write an error message to standard error ( file descriptor 2 - not stderr ). Bad quick slot : overwrite before allocated mem **** MMmalloc panic identifier 0x10000a16 **** Possible problem/corrupted addresses 0x10056744 or 0x10053b20 Dumping core Process 25934 (mstress.mm) Segmentation fault (Signal 11) The message will usually have a likely cause of error after a ':'. It is impossible to determine exactly what the problems are so take this likely cause of error as a guess only. The following is output using the MMdbx dbx script. This is output from MMdbx on the error corresponding to the failure above. 0x5fff0934 = "Bad quick slot : overwrite before allocated mem\n" 0x100572e4 0x10056b88 0x10000a16 2 1 There will be two likely problem addresses printed that could be the corrupted words. The most likely problem address is the first one printed. In some cases the second address is the address being referenced. The rest are ID's so that the problem can be traced to a line and state in MMmalloc. malloc_check is very exhaustive but does not detect an invalid virtual address before accessing it in all cases. For some errors, malloc_check will be forced to reference an invalid address and it will not output any error messages before dumping core. However you may still be able to run the MMdbx script from dbx on the resultant core and get meaningfull diagnostics as described above. It is possible to call malloc_check from the debugger but is only reliable if the application is not currently within libmalloc. malloc_check also marks and unmarks structures in the arena so it is unwise to capture the malloc_check SIGSEGV signal and continue operation since all marks may have not been unmarked. struct mallinfo qmallinfo( void ) struct mallinfo aqmallinfo( void * arena ) This is a quick version of mallinfo() that will only fill the following fields of the struct mallinfo record :- arena usmblks uordblks fordblks These are provided by running totals accumulated by MMmalloc and do not require walking the entire arena to compute. void * malignoff( size_t alignment, size_t offset, size_t size ) void * amalignoff( size_t alignment, size_t offset, size_t size, void * arena ) This is a more general version of memalign. It allows you to provide an offset into the allocated memory that is where the alignment is desired. Values for alignment must be a power of 2. malignoff does have a larger memory use overhead than malloc so it should only be used sparingly. void addmem( void * p_mem, size_t size ) void aaddmem( void * p_mem, size_t size, void * arena ) This allows the application to add any memory to an arena. int _mm_defaultoptions = N; This allows the application to override the default configuration being by having _mm_defaultoptions set before any call to malloc. New mallopt Commands -------------------- M_ASSIGNLOCK This is only available to amallopt. It allows the assignment of a lock to an arena. This is the same lock as created by acreate() but it can be assigned to the arena at any time. This only effects the amalloc, afree, arealloc, acalloc, amallopt, arecycle, amallinfo interface. For controlling threading on malloc, free, realloc, calloc, mallopt, mallinfo, recycle, qmallinfo, malloc_check etc you should investigate the usconfig function with the parameters CONF_STHREADMALLOCOFF or CONF_STHREADMALLOCON. It is also up to the user to free the lock when nessasary. M_USERDATA This allows the user to set a data field in the arena structure. This value is later passed to _mm_munmap and _mm_mmapmem during a call to recycle. Reporting New Problems ---------------------- In the meantime, you may contact me directly via email if you discover any problems with MMmalloc. "Gianni Mariani" <gianni@neu.sgi.com> Known Problems or Bugs ---------------------- recycle() will create a large number of virtual memory regions. IRIX 5.2 has an O(N) algorithm for handling page faults with respect to the number of mapped regions so the performance will degrade as more and mapped regions are created. This is a problem for any application making extensive use of mmap and munmap and creating large numbers of regions. This problem may be corrected in future releases of IRIX. addmem() touches the last page of memory being added making it unsuitable for large AUTOGROW mapped files. libmalloc's memalign() behaves sensibly if an alignment other than a power of two is given while MMmalloc will fail as will libc's malloc. Do not install MMmalloc on IRIX system release befores 5.2 as you may expose some nasty bugs in the memory system. malloc_check does not always write an error to standard error. Small shared arenas do not have a reasonable MMmalloc configuration. API for configuring MMmalloc parameters is very weak. .3f04
Source
Documentation